联合学习是一种来自分散数据集的培训模型的新兴技术。在许多应用程序中,参与联合学习系统的数据所有者不仅拥有数据,还拥有一组域知识。这些知识包括人类的知识和工艺,对联邦学习任务非常有帮助。在这项工作中,我们提出了一个联合学习框架,该框架允许注入参与者的领域知识,其中关键思想是通过本地知识来完善全球模型。我们认为的方案是由真正的行业级应用激励的,我们证明了我们采用该应用的有效性。
translated by 谷歌翻译
图形神经网络(GNNS)已成为处理机器学习任务的有效方法,它为构建推荐系统带来了一种新方法,其中可以将推荐任务作为用户 - 项目的链接预测问题提出, 。培训基于GNN的推荐系统(GNNRECSYS)在大图上会引起大型内存足迹,很容易超过典型服务器上的DRAM容量。现有的解决方案诉诸分布式子图培训,这是由于动态构建子图和各个子图的大量冗余的高成本而效率低下。新兴的Intel Optane持久记忆使一台机器以可承受的成本具有最多6 TB的存储器,从而使单机器Gnnrecsys训练可行,从而消除了分布式培训中的效率低下。与DRAM相比,将Optane用于Gnnrecsys的一个主要问题是Optane相对较低的带宽。由于其主要的计算内核稀疏且内存访问密集,因此这种限制可能对Gnnrecsys工作量的高性能特别有害。为了了解Optane是否适合Gnnrecsys培训,我们对Gnnrecsys工作负载进行了深入的表征和全面的基准测试研究。我们的基准测试结果表明,经过正确配置后,基于Optane的单机器GNNRECSYS训练优于大幅度的培训,尤其是在处理深度GNN模型时。我们分析了加速度的来源,提供有关如何为GNNRECSYS工作负载配置Optane的指导,并讨论进一步优化的机会。
translated by 谷歌翻译
在本报告中,我们在CVPR 2022的Waymo Open数据集挑战中介绍了解决方案和流程预测挑战,该挑战在排行榜上排名第一。我们已经开发了一个新型的层次空间时间网络,该网络具有时空编码器,一个富含潜在变量的多尺度聚合器以及一个递归层次结构3D解码器。我们使用多种损失,包括局灶性损失和修改的流量损失来有效指导训练过程。我们的方法达到了一个占地0.8389的流动占用AUC,并且优于排行榜上所有其他团队。
translated by 谷歌翻译
对象检测是典型的多任务学习应用程序,其同时优化分类和回归。但是,分类损失总是以基于锚的方法的多任务损失主导,妨碍了任务的一致和平衡优化。在本文中,我们发现转移边界盒可以在分类中改变正面和负样本的划分,意思是分类取决于回归。此外,考虑到不同的数据集,优化器和回归损耗功能,我们总结了关于微调损耗重量的三个重要结论。基于上述结论,我们提出了自适应损失重量调整(ALWA)以根据损失的统计特征来解决优化基于锚的方法的不平衡。通过将Alwa纳入以前的最先进的探测器,我们在Pascal VOC和MS Coco上实现了显着的性能增益,即使是L1,Smoothl1和Ciou丢失。代码可在https://github.com/ywx-hub/alwa获得。
translated by 谷歌翻译
The application of natural language processing (NLP) to cancer pathology reports has been focused on detecting cancer cases, largely ignoring precancerous cases. Improving the characterization of precancerous adenomas assists in developing diagnostic tests for early cancer detection and prevention, especially for colorectal cancer (CRC). Here we developed transformer-based deep neural network NLP models to perform the CRC phenotyping, with the goal of extracting precancerous lesion attributes and distinguishing cancer and precancerous cases. We achieved 0.914 macro-F1 scores for classifying patients into negative, non-advanced adenoma, advanced adenoma and CRC. We further improved the performance to 0.923 using an ensemble of classifiers for cancer status classification and lesion size named entity recognition (NER). Our results demonstrated the potential of using NLP to leverage real-world health record data to facilitate the development of diagnostic tests for early cancer prevention.
translated by 谷歌翻译
Motion prediction is highly relevant to the perception of dynamic objects and static map elements in the scenarios of autonomous driving. In this work, we propose PIP, the first end-to-end Transformer-based framework which jointly and interactively performs online mapping, object detection and motion prediction. PIP leverages map queries, agent queries and mode queries to encode the instance-wise information of map elements, agents and motion intentions, respectively. Based on the unified query representation, a differentiable multi-task interaction scheme is proposed to exploit the correlation between perception and prediction. Even without human-annotated HD map or agent's historical tracking trajectory as guidance information, PIP realizes end-to-end multi-agent motion prediction and achieves better performance than tracking-based and HD-map-based methods. PIP provides comprehensive high-level information of the driving scene (vectorized static map and dynamic objects with motion information), and contributes to the downstream planning and control. Code and models will be released for facilitating further research.
translated by 谷歌翻译
Video event extraction aims to detect salient events from a video and identify the arguments for each event as well as their semantic roles. Existing methods focus on capturing the overall visual scene of each frame, ignoring fine-grained argument-level information. Inspired by the definition of events as changes of states, we propose a novel framework to detect video events by tracking the changes in the visual states of all involved arguments, which are expected to provide the most informative evidence for the extraction of video events. In order to capture the visual state changes of arguments, we decompose them into changes in pixels within objects, displacements of objects, and interactions among multiple arguments. We further propose Object State Embedding, Object Motion-aware Embedding and Argument Interaction Embedding to encode and track these changes respectively. Experiments on various video event extraction tasks demonstrate significant improvements compared to state-of-the-art models. In particular, on verb classification, we achieve 3.49% absolute gains (19.53% relative gains) in F1@5 on Video Situation Recognition.
translated by 谷歌翻译
Among current anchor-based detectors, a positive anchor box will be intuitively assigned to the object that overlaps it the most. The assigned label to each anchor will directly determine the optimization direction of the corresponding prediction box, including the direction of box regression and category prediction. In our practice of crowded object detection, however, the results show that a positive anchor does not always regress toward the object that overlaps it the most when multiple objects overlap. We name it anchor drift. The anchor drift reflects that the anchor-object matching relation, which is determined by the degree of overlap between anchors and objects, is not always optimal. Conflicts between the fixed matching relation and learned experience in the past training process may cause ambiguous predictions and thus raise the false-positive rate. In this paper, a simple but efficient adaptive two-stage anchor assignment (TSAA) method is proposed. It utilizes the final prediction boxes rather than the fixed anchors to calculate the overlap degree with objects to determine which object to regress for each anchor. The participation of the prediction box makes the anchor-object assignment mechanism adaptive. Extensive experiments are conducted on three classic detectors RetinaNet, Faster-RCNN and YOLOv3 on CrowdHuman and COCO to evaluate the effectiveness of TSAA. The results show that TSAA can significantly improve the detectors' performance without additional computational costs or network structure changes.
translated by 谷歌翻译
顺序推荐(SR)通过对用户在项目之间的过境方式进行建模来表征用户行为不断发展的模式。但是,简短的交互序列限制了现有SR的性能。为了解决这个问题,我们专注于本文中的跨域顺序推荐(CDSR),该建议旨在利用其他域中的信息来提高单个域的顺序建议性能。解决CDSR具有挑战性。一方面,如何保留单个领域的偏好以及整合跨域影响仍然是一个基本问题。另一方面,由于合并序列的长度有限,因此仅利用来自其他域的知识来完全解决数据稀疏问题。为了应对挑战,我们提出了DDGHM,这是CDSR问题的新型框架,其中包括两个主要模块,即双动态图形建模和混合度量训练。前者通过动态构造两级图,即局部图和全局图,捕获内域和域间顺序跃迁,并将它们与融合的细心门控机制结合在一起。后者通过采用混合度量学习来增强用户和项目表示形式,包括实现保持一致性和对比度度量的协作指标,以确保均匀性,以进一步减轻数据稀少性问题并提高预测准确性。我们在两个基准数据集上进行实验,结果证明了DDHMG的有效性。
translated by 谷歌翻译
Diagnosis-oriented dialogue system queries the patient's health condition and makes predictions about possible diseases through continuous interaction with the patient. A few studies use reinforcement learning (RL) to learn the optimal policy from the joint action space of symptoms and diseases. However, existing RL (or Non-RL) methods cannot achieve sufficiently good prediction accuracy, still far from its upper limit. To address the problem, we propose a decoupled automatic diagnostic framework DxFormer, which divides the diagnosis process into two steps: symptom inquiry and disease diagnosis, where the transition from symptom inquiry to disease diagnosis is explicitly determined by the stopping criteria. In DxFormer, we treat each symptom as a token, and formalize the symptom inquiry and disease diagnosis to a language generation model and a sequence classification model respectively. We use the inverted version of Transformer, i.e., the decoder-encoder structure, to learn the representation of symptoms by jointly optimizing the reinforce reward and cross entropy loss. Extensive experiments on three public real-world datasets prove that our proposed model can effectively learn doctors' clinical experience and achieve the state-of-the-art results in terms of symptom recall and diagnostic accuracy.
translated by 谷歌翻译